Hands-On Differential Privacy by Ethan Cowan Mayana Pereira and Michael Shoemate

Hands-On Differential Privacy by Ethan Cowan Mayana Pereira and Michael Shoemate

Author:Ethan Cowan, Mayana Pereira, and Michael Shoemate
Language: eng
Format: epub
Publisher: O'Reilly Media, Inc.
Published: 2022-12-26T00:00:00+00:00


In 2006, AOL released an anonymized data set of search activity from its service. This sample contained 20 million queries made by more than 650,000 users over 3 months. Although the usernames were obfuscated, many of the search queries themselves contained personally-identifiable information. This resulted in several users being identified and matched to their account and search history.

This release led to the resignation of two senior staff members and a class action lawsuit that was settled in 2013. It also caused enormous harm to AOL’s public image, and exposed the identities of real people who were using the service with the assumption that their privacy would be protected.

The AOL release is an example of an event-level database, where each row represents an action, and an individual may contribute to multiple rows. Privatizing such databases requires different approaches than the databases we have so far encountered.

This example demonstrates that you need to be careful when dealing with event-level data. The privacy leak could have been avoided with differential privacy, but not in the way you’ve been using it. Why not? Previous chapters have considered scenarios where each row in a dataset represents a unique individual. This chapter shows how database characteristics affect differential privacy and how to adapt the concepts introduced so far to different database types. Differentially private values always rely on the concept of neighboring databases. Up until now, the neighboring databases have each differed by one row that represents a single individual.

What happens when a database contains multiple rows that correspond to a single individual? This is common in many practical scenarios, such as hospital visit databases, network log databases, and others. In such cases, a straightforward application of the concept of neighboring databases and function sensitivity used in the previous chapters won’t provide differential privacy guarantees.

This chapter explores the differences between user-level and event-level databases and the differences faced when designing a differential privacy project on an event-level database.

By the end of this chapter, you should:

Understand the main differences between user-level databases and event-level databases



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.